Overview

Dataset statistics

Number of variables14
Number of observations506
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory55.5 KiB
Average record size in memory112.3 B

Variable types

NUM13
BOOL1

Warnings

TAX is highly correlated with RADHigh correlation
RAD is highly correlated with TAXHigh correlation
ZN has 372 (73.5%) zeros Zeros

Reproduction

Analysis started2020-09-18 13:31:17.421633
Analysis finished2020-09-18 13:31:44.848145
Duration27.43 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

CRIM
Real number (ℝ≥0)

Distinct504
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.613523557
Minimum0.00632
Maximum88.9762
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.00632
5-th percentile0.02791
Q10.082045
median0.25651
Q33.6770825
95-th percentile15.78915
Maximum88.9762
Range88.96988
Interquartile range (IQR)3.5950375

Descriptive statistics

Standard deviation8.601545105
Coefficient of variation (CV)2.380376098
Kurtosis37.13050913
Mean3.613523557
Median Absolute Deviation (MAD)0.22145
Skewness5.223148798
Sum1828.44292
Variance73.9865782
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14.333720.4%
 
0.0150120.4%
 
0.0826510.2%
 
0.53710.2%
 
1.3547210.2%
 
0.1410310.2%
 
0.0350210.2%
 
0.0361510.2%
 
0.6635110.2%
 
0.126510.2%
 
Other values (494)49497.6%
 
ValueCountFrequency (%) 
0.0063210.2%
 
0.0090610.2%
 
0.0109610.2%
 
0.0130110.2%
 
0.0131110.2%
 
ValueCountFrequency (%) 
88.976210.2%
 
73.534110.2%
 
67.920810.2%
 
51.135810.2%
 
45.746110.2%
 

ZN
Real number (ℝ≥0)

ZEROS

Distinct26
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.36363636
Minimum0
Maximum100
Zeros372
Zeros (%)73.5%
Memory size4.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312.5
95-th percentile80
Maximum100
Range100
Interquartile range (IQR)12.5

Descriptive statistics

Standard deviation23.32245299
Coefficient of variation (CV)2.052375864
Kurtosis4.031510084
Mean11.36363636
Median Absolute Deviation (MAD)0
Skewness2.225666323
Sum5750
Variance543.9368137
MonotocityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%) 
037273.5%
 
20214.2%
 
80153.0%
 
12.5102.0%
 
22102.0%
 
25102.0%
 
4071.4%
 
4561.2%
 
3061.2%
 
9051.0%
 
Other values (16)448.7%
 
ValueCountFrequency (%) 
037273.5%
 
12.5102.0%
 
17.510.2%
 
1810.2%
 
20214.2%
 
ValueCountFrequency (%) 
10010.2%
 
9540.8%
 
9051.0%
 
8520.4%
 
82.520.4%
 

INDUS
Real number (ℝ≥0)

Distinct76
Distinct (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.13677866
Minimum0.46
Maximum27.74
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.46
5-th percentile2.18
Q15.19
median9.69
Q318.1
95-th percentile21.89
Maximum27.74
Range27.28
Interquartile range (IQR)12.91

Descriptive statistics

Standard deviation6.860352941
Coefficient of variation (CV)0.6160087358
Kurtosis-1.233539601
Mean11.13677866
Median Absolute Deviation (MAD)6.32
Skewness0.2950215679
Sum5635.21
Variance47.06444247
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
18.113226.1%
 
19.58305.9%
 
8.14224.3%
 
6.2183.6%
 
21.89153.0%
 
9.9122.4%
 
3.97122.4%
 
8.56112.2%
 
10.59112.2%
 
5.86102.0%
 
Other values (66)23346.0%
 
ValueCountFrequency (%) 
0.4610.2%
 
0.7410.2%
 
1.2110.2%
 
1.2210.2%
 
1.2520.4%
 
ValueCountFrequency (%) 
27.7451.0%
 
25.6571.4%
 
21.89153.0%
 
19.58305.9%
 
18.113226.1%
 

CHAS
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
0
471 
1
 
35
ValueCountFrequency (%) 
047193.1%
 
1356.9%
 

NOX
Real number (ℝ≥0)

Distinct81
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5546950593
Minimum0.385
Maximum0.871
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.385
5-th percentile0.40925
Q10.449
median0.538
Q30.624
95-th percentile0.74
Maximum0.871
Range0.486
Interquartile range (IQR)0.175

Descriptive statistics

Standard deviation0.1158776757
Coefficient of variation (CV)0.2089033853
Kurtosis-0.06466713337
Mean0.5546950593
Median Absolute Deviation (MAD)0.0875
Skewness0.7293079225
Sum280.6757
Variance0.01342763572
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.538234.5%
 
0.713183.6%
 
0.437173.4%
 
0.871163.2%
 
0.489153.0%
 
0.624153.0%
 
0.693142.8%
 
0.605142.8%
 
0.74132.6%
 
0.544122.4%
 
Other values (71)34969.0%
 
ValueCountFrequency (%) 
0.38510.2%
 
0.38910.2%
 
0.39220.4%
 
0.39410.2%
 
0.39820.4%
 
ValueCountFrequency (%) 
0.871163.2%
 
0.7781.6%
 
0.74132.6%
 
0.71861.2%
 
0.713183.6%
 

RM
Real number (ℝ≥0)

Distinct446
Distinct (%)88.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.284634387
Minimum3.561
Maximum8.78
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum3.561
5-th percentile5.314
Q15.8855
median6.2085
Q36.6235
95-th percentile7.5875
Maximum8.78
Range5.219
Interquartile range (IQR)0.738

Descriptive statistics

Standard deviation0.7026171434
Coefficient of variation (CV)0.1117992074
Kurtosis1.891500366
Mean6.284634387
Median Absolute Deviation (MAD)0.3455
Skewness0.4036121333
Sum3180.025
Variance0.4936708502
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
6.16730.6%
 
6.22930.6%
 
6.12730.6%
 
5.71330.6%
 
6.41730.6%
 
6.40530.6%
 
6.3820.4%
 
5.30420.4%
 
5.98320.4%
 
7.18520.4%
 
Other values (436)48094.9%
 
ValueCountFrequency (%) 
3.56110.2%
 
3.86310.2%
 
4.13820.4%
 
4.36810.2%
 
4.51910.2%
 
ValueCountFrequency (%) 
8.7810.2%
 
8.72510.2%
 
8.70410.2%
 
8.39810.2%
 
8.37510.2%
 

AGE
Real number (ℝ≥0)

Distinct356
Distinct (%)70.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.57490119
Minimum2.9
Maximum100
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum2.9
5-th percentile17.725
Q145.025
median77.5
Q394.075
95-th percentile100
Maximum100
Range97.1
Interquartile range (IQR)49.05

Descriptive statistics

Standard deviation28.14886141
Coefficient of variation (CV)0.410483441
Kurtosis-0.9677155942
Mean68.57490119
Median Absolute Deviation (MAD)19.55
Skewness-0.5989626399
Sum34698.9
Variance792.3583985
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100438.5%
 
9640.8%
 
98.240.8%
 
95.440.8%
 
97.940.8%
 
87.940.8%
 
98.840.8%
 
94.130.6%
 
8830.6%
 
21.430.6%
 
Other values (346)43085.0%
 
ValueCountFrequency (%) 
2.910.2%
 
610.2%
 
6.210.2%
 
6.510.2%
 
6.620.4%
 
ValueCountFrequency (%) 
100438.5%
 
99.310.2%
 
99.110.2%
 
98.930.6%
 
98.840.8%
 

DIS
Real number (ℝ≥0)

Distinct412
Distinct (%)81.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.795042688
Minimum1.1296
Maximum12.1265
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1.1296
5-th percentile1.461975
Q12.100175
median3.20745
Q35.188425
95-th percentile7.8278
Maximum12.1265
Range10.9969
Interquartile range (IQR)3.08825

Descriptive statistics

Standard deviation2.105710127
Coefficient of variation (CV)0.5548580872
Kurtosis0.4879411222
Mean3.795042688
Median Absolute Deviation (MAD)1.29115
Skewness1.011780579
Sum1920.2916
Variance4.434015137
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3.495251.0%
 
5.287340.8%
 
5.400740.8%
 
5.720940.8%
 
6.814740.8%
 
3.651930.6%
 
7.317230.6%
 
5.491730.6%
 
7.827830.6%
 
5.415930.6%
 
Other values (402)47092.9%
 
ValueCountFrequency (%) 
1.129610.2%
 
1.13710.2%
 
1.169110.2%
 
1.174210.2%
 
1.178110.2%
 
ValueCountFrequency (%) 
12.126510.2%
 
10.710320.4%
 
10.585720.4%
 
9.222910.2%
 
9.220320.4%
 

RAD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.549407115
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q324
95-th percentile24
Maximum24
Range23
Interquartile range (IQR)20

Descriptive statistics

Standard deviation8.707259384
Coefficient of variation (CV)0.9118115166
Kurtosis-0.8672319936
Mean9.549407115
Median Absolute Deviation (MAD)2
Skewness1.004814648
Sum4832
Variance75.81636598
MonotocityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
2413226.1%
 
511522.7%
 
411021.7%
 
3387.5%
 
6265.1%
 
8244.7%
 
2244.7%
 
1204.0%
 
7173.4%
 
ValueCountFrequency (%) 
1204.0%
 
2244.7%
 
3387.5%
 
411021.7%
 
511522.7%
 
ValueCountFrequency (%) 
2413226.1%
 
8244.7%
 
7173.4%
 
6265.1%
 
511522.7%
 

TAX
Real number (ℝ≥0)

HIGH CORRELATION

Distinct66
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean408.2371542
Minimum187
Maximum711
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum187
5-th percentile222
Q1279
median330
Q3666
95-th percentile666
Maximum711
Range524
Interquartile range (IQR)387

Descriptive statistics

Standard deviation168.5371161
Coefficient of variation (CV)0.4128411987
Kurtosis-1.142407992
Mean408.2371542
Median Absolute Deviation (MAD)73
Skewness0.6699559418
Sum206568
Variance28404.75949
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
66613226.1%
 
307407.9%
 
403305.9%
 
437153.0%
 
304142.8%
 
264122.4%
 
398122.4%
 
277112.2%
 
384112.2%
 
330102.0%
 
Other values (56)21943.3%
 
ValueCountFrequency (%) 
18710.2%
 
18871.4%
 
19381.6%
 
19810.2%
 
21651.0%
 
ValueCountFrequency (%) 
71151.0%
 
66613226.1%
 
46910.2%
 
437153.0%
 
43291.8%
 

PTRATIO
Real number (ℝ≥0)

Distinct46
Distinct (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.4555336
Minimum12.6
Maximum22
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum12.6
5-th percentile14.7
Q117.4
median19.05
Q320.2
95-th percentile21
Maximum22
Range9.4
Interquartile range (IQR)2.8

Descriptive statistics

Standard deviation2.164945524
Coefficient of variation (CV)0.1173060379
Kurtosis-0.2850913833
Mean18.4555336
Median Absolute Deviation (MAD)1.15
Skewness-0.8023249269
Sum9338.5
Variance4.686989121
MonotocityNot monotonic
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%) 
20.214027.7%
 
14.7346.7%
 
21275.3%
 
17.8234.5%
 
19.2193.8%
 
17.4183.6%
 
18.6173.4%
 
19.1173.4%
 
16.6163.2%
 
18.4163.2%
 
Other values (36)17935.4%
 
ValueCountFrequency (%) 
12.630.6%
 
13122.4%
 
13.610.2%
 
14.410.2%
 
14.7346.7%
 
ValueCountFrequency (%) 
2220.4%
 
21.2153.0%
 
21.110.2%
 
21275.3%
 
20.9112.2%
 

B
Real number (ℝ≥0)

Distinct357
Distinct (%)70.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean356.6740316
Minimum0.32
Maximum396.9
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.32
5-th percentile84.59
Q1375.3775
median391.44
Q3396.225
95-th percentile396.9
Maximum396.9
Range396.58
Interquartile range (IQR)20.8475

Descriptive statistics

Standard deviation91.29486438
Coefficient of variation (CV)0.255961624
Kurtosis7.226817549
Mean356.6740316
Median Absolute Deviation (MAD)5.46
Skewness-2.890373712
Sum180477.06
Variance8334.752263
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
396.912123.9%
 
395.2430.6%
 
393.7430.6%
 
393.2320.4%
 
394.7220.4%
 
396.2120.4%
 
395.6920.4%
 
396.0620.4%
 
395.6320.4%
 
395.620.4%
 
Other values (347)36572.1%
 
ValueCountFrequency (%) 
0.3210.2%
 
2.5210.2%
 
2.610.2%
 
3.510.2%
 
3.6510.2%
 
ValueCountFrequency (%) 
396.912123.9%
 
396.4210.2%
 
396.3310.2%
 
396.310.2%
 
396.2810.2%
 

LSTAT
Real number (ℝ≥0)

Distinct455
Distinct (%)89.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.65306324
Minimum1.73
Maximum37.97
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1.73
5-th percentile3.7075
Q16.95
median11.36
Q316.955
95-th percentile26.8075
Maximum37.97
Range36.24
Interquartile range (IQR)10.005

Descriptive statistics

Standard deviation7.141061511
Coefficient of variation (CV)0.5643741263
Kurtosis0.4932395174
Mean12.65306324
Median Absolute Deviation (MAD)4.795
Skewness0.9064600936
Sum6402.45
Variance50.99475951
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14.130.6%
 
6.3630.6%
 
18.1330.6%
 
8.0530.6%
 
7.7930.6%
 
9.520.4%
 
4.5920.4%
 
3.7620.4%
 
17.2720.4%
 
10.1120.4%
 
Other values (445)48195.1%
 
ValueCountFrequency (%) 
1.7310.2%
 
1.9210.2%
 
1.9810.2%
 
2.4710.2%
 
2.8710.2%
 
ValueCountFrequency (%) 
37.9710.2%
 
36.9810.2%
 
34.7710.2%
 
34.4110.2%
 
34.3710.2%
 

MEDV
Real number (ℝ≥0)

Distinct229
Distinct (%)45.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.53280632
Minimum5
Maximum50
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum5
5-th percentile10.2
Q117.025
median21.2
Q325
95-th percentile43.4
Maximum50
Range45
Interquartile range (IQR)7.975

Descriptive statistics

Standard deviation9.197104087
Coefficient of variation (CV)0.408165053
Kurtosis1.495196944
Mean22.53280632
Median Absolute Deviation (MAD)4
Skewness1.108098408
Sum11401.6
Variance84.58672359
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
50163.2%
 
2581.6%
 
23.171.4%
 
21.771.4%
 
2271.4%
 
20.661.2%
 
19.461.2%
 
20.151.0%
 
19.651.0%
 
19.351.0%
 
Other values (219)43485.8%
 
ValueCountFrequency (%) 
520.4%
 
5.610.2%
 
6.310.2%
 
720.4%
 
7.230.6%
 
ValueCountFrequency (%) 
50163.2%
 
48.810.2%
 
48.510.2%
 
48.310.2%
 
46.710.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
50.029850.02.180.00.4586.43058.76.06223.0222.018.7394.125.2128.7
60.0882912.57.870.00.5246.01266.65.56055.0311.015.2395.6012.4322.9
70.1445512.57.870.00.5246.17296.15.95055.0311.015.2396.9019.1527.1
80.2112412.57.870.00.5245.631100.06.08215.0311.015.2386.6329.9316.5
90.1700412.57.870.00.5246.00485.96.59215.0311.015.2386.7117.1018.9

Last rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
4960.289600.09.690.00.5855.39072.92.79866.0391.019.2396.9021.1419.7
4970.268380.09.690.00.5855.79470.62.89276.0391.019.2396.9014.1018.3
4980.239120.09.690.00.5856.01965.32.40916.0391.019.2396.9012.9221.2
4990.177830.09.690.00.5855.56973.52.39996.0391.019.2395.7715.1017.5
5000.224380.09.690.00.5856.02779.72.49826.0391.019.2396.9014.3316.8
5010.062630.011.930.00.5736.59369.12.47861.0273.021.0391.999.6722.4
5020.045270.011.930.00.5736.12076.72.28751.0273.021.0396.909.0820.6
5030.060760.011.930.00.5736.97691.02.16751.0273.021.0396.905.6423.9
5040.109590.011.930.00.5736.79489.32.38891.0273.021.0393.456.4822.0
5050.047410.011.930.00.5736.03080.82.50501.0273.021.0396.907.8811.9